Method and apparatus for selection of signals in a teleconference

ABSTRACT

A method and apparatus for providing appropriate output signals to output devices in a teleconference setting is disclosed. Input signals are obtained from input devices, and descriptive information describing the teleconference is received from several sensors. On a substantially continuous basis, using the descriptive information, a desirability is computed for each of several possible output configurations, each output configuration specifying a routing of output signals to output devices. The most desirable output configuration is then selected, and output signals are provided to output devices as specified by the selected output configuration.

TECHNICAL FIELD

[0001] The invention relates to teleconferencing. More particularly, theinvention relates to a method and apparatus for selecting signals in ateleconference.

DESCRIPTION OF THE PRIOR ART

[0002] The primary goal of teleconferencing systems is to provide, at aremote teleconference site, a high fidelity representation of thepersons present and of events occurring at a local teleconference site.A teleconferencing system that represents the local conferencing sitewith sufficient fidelity enables effective communication andcollaboration among the participants despite their physical separation.

[0003] In practice, it is difficult to capture the persons and events ata local conferencing site effectively using a single video feed from asingle video camera and a single audio feed from a single microphone.This is especially true in conferences with more than one localconferencing participant. While employing a single camera with awide-angle view of a local conferencing site may successfully capturemore than one participant within the camera field of view, such viewscreate a sense of distance that is neither comfortable nor engaging forthe remote participant.

[0004] Several prior art video conferencing systems, including theViewstation MP, manufactured by Polycom, Inc. of Pleasanton, Calif.,have attempted to mitigate this shortcoming with a motion control videocamera. The camera automatically tracks a single video conferencingparticipant or pans and tilts to capture multiple participants,successively, within the field of view. While this approach does providea closer view of individual participants, the moving view captured by apanning and tilting camera as it transitions from one participant toanother is disconcerting when viewed by the remote participant.

[0005] To avoid the panning and tilting motion provided by motioncontrol cameras, several prior art conferencing systems, including theCT-4A Automatic Mixer, manufactured by Jefferson Audio Systems ofLouisville, Ky., have incorporated video feeds from multiple videocameras, and audio feeds from multiple microphones. In addition, manysystems allow for the transmission of video and audio feeds from sourcessuch as laptop computers, document cameras, and video cassetterecorders.

[0006] Because a teleconferencing system must operate within the limitedbandwidth connecting a local and remote location, it is in practice notpossible to transmit all of the audio and video signals to the remotelocation. Moreover, the amount of visual and aural information theremote participant can comfortably process is itself limited. It istherefore desirable to determine, among the many video and audio feedsavailable at the local conferencing site, which feed or feeds totransmit to the remote location.

[0007] Several prior art approaches, including U.S. Pat. No. 6,025,870to Hardy have suggested that the selection of the video and audiosignals may be performed in a manner that simulates the shift inattention of an observer physically present at the local site. Forexample, the selected video signal may be obtained from a video cameraoffering a prominent view of the current speaker, and the selected audiosignal may be obtained from a microphone offering the clearest renderingof the current dialogue. Providing video and audio signals to the remoteparticipant in this manner provides a more natural interaction with thelocal teleconferencing site.

[0008] In some instances, selection of signals in this manner requires ahuman operator. This approach is distracting if carried out by a meetingparticipant, or costly, if carried out by a hired director. A fewsystems, however, attempt to perform the signal selection in anautomated manner. T. Inoue, K. Okada, and Y. Matsushita, Learning fromTV Programs: Application of TV Presentation to a VideoconferencingSystem and Proceedings of the ACM Symposium on User Interface Softwareand Technology, pp. 147-154, Pittsburgh, Pa. (Nov. 14-17, 1995) proposean automated system emulating the direction techniques used in thetelevision industry.

[0009] U.S. Pat. No. 6,025,870 to Hardy describes a system forautomatically capturing the changing focus of a video conference. Thesystem “includes a video switch for selecting focus video information, aphysical video input node coupled to provide physical video informationto the video switch, a graphics processing module coupled to providegraphical video information to the video switch, and a remote sourceinterface coupled to provide remote video information to the videoswitch. The videoconference system further includes an audio processingmodule for processing audio information. A record controller is coupledto the video switch, the graphics processing module and the audioprocessing module. The record controller is coupled to receive eventinformation from the audio processing module and the graphics processingmodule. The record controller automatically determines a focus videosource from the physical video input, the graphics processing module andthe remote source interface responsive to receiving the eventinformation. The record controller controls the video switch to couplethe focus video source to a video switch output responsive todetermining the focus video source.”

[0010] While the systems disclosed by Inoue et al. and Hardy do provideimprovement over more traditional systems, several deficiencies remain.In particular the Inoue system merely considers a relative probabilityof transitions from a current signal to a subsequent signal based on theclasses of the current signal and available signals, where the signalclasses are defined by the subject matter represented by the videosignal. The system has, if any, a very limited sense of the currentstate and context of the video conference. The system is thereforeunable to select meaningfully an appropriate signal based on thespecific progression of events in a particular video conference, andinstead transitions from one signal to another along standardizedsequences.

[0011] The system disclosed by Hardy does incorporate an understandingof the current state of the conference, as indicated by the eventsreceived by the record controller. However, the ability of the system torespond to the changing state of the conference is limited to specificresponses to specific events. Most notably, the system is unable todevelop a continually refined assessment of the state and context of theconference. Instead, the system merely waits for a recognized event andthen responds accordingly.

[0012] Moreover, neither system suggests that the selection of signalscould be based on a history of the conference state, or a prediction offuture conference states. Further, neither prior art system attempts todevelop a quantitative estimate of the suitability of selection for eachof the potentially selected signals. In these regards, the systems aremore rule-based than model-based.

[0013] Finally, the prior art systems do not suggest a signal selectionmethod that changes throughout the course of a conference to remainconsistent with the changing dynamics of a typical business meeting.

[0014] What is needed is a system that continually monitors ateleconference to develop an understanding of the state and context ofthe conference. Based on this understanding, the system should considerand evaluate each candidate configuration of output signals, preferablyquantitatively, and select from among the candidate outputconfigurations a most desirable output configuration. In this manner,the system should develop a model of the conference, preferablyincorporating a sense of continuity in the progression of selectedoutput configurations. Further, the model is preferably variedthroughout the course of the conference to allow for the changingdynamics of a typical business meeting.

[0015] Furthermore, the system, when operated at a local videoconferencing site, should be compatible with any existingteleconferencing equipment present at the remote site.

[0016] Finally, the system should have interfaces that are simple andintuitive, allowing use by those with little or no computer literacy.

[0017] Importantly, the automated selection should be accomplished in amanner providing an accurate and engaging representation of theteleconference, thus allowing for more natural and meaningfulinteraction between physically separated teleconference participants.

SUMMARY

[0018] The invention provides appropriate output signals to outputdevices in a teleconference setting. Input signals are obtained frominput devices, and information describing the teleconference is receivedfrom several sensors. On a substantially continuous basis, using thedescriptive information, a desirability is computed for each of severalpossible output configurations, where each output configurationspecifies a routing of output signals to output devices. The mostdesirable output configuration is then selected, and output signals areprovided to output devices as specified by the selected outputconfiguration.

[0019] Exemplary input devices include video cameras, computers,document scanners, and microphones. Exemplary sensors includemicrophones, motion detectors, and security badge readers. Outputsignals are composed from the input signals provided by the inputdevices. Examples of output signal composition include a selection of aninput signal or composing a split-screen view from two or more inputsignals. The output signals are provided to output devices such astelevision monitors, computer displays, video recording devices, audiorecording devices, and printers.

[0020] In the preferred embodiment of the invention, the desirability ofeach possible output configuration is calculated based on contributionsfrom several components. Each component is multiplied by a componentweighting and then additively combined with the other components toyield the desirability. These components can include, for example, anactivity component, a saturation component, and a continuity component.

[0021] The activity component is based on contributions from severalactivity terms. Each activity term is multiplied by an activity termweighting and then additively combined with the other activity terms toyield the activity component of the desirability. Activity terms can,for example, include an audio activity term, a motion activity term, anaudio undercoverage term, and an audio overcoverage term.

[0022] The audio activity term reflects the desirability of the possibleoutput configurations based on audio activity detected by microphoneswithin the teleconference site.

[0023] The motion term reflects the desirability of the possible outputconfigurations based on motion detected by motion sensors within theteleconference site.

[0024] The audio undercoverage term indicates an increasing desirabilityfor those output configurations incorporating output signals related toaudio activity and yet not incorporated within the output configurationcurrently provided to the output devices. Finally, the audioovercoverage term indicates a decreasing desirability for those outputconfigurations incorporating output signals not related to audioactivity and yet incorporated within the output configuration currentlyprovided to the output devices.

[0025] The saturation component indicates an increasing desirability foroutput configurations incorporating output signals not currentlyprovided to the output devices, and a decreasing desirability for outputconfigurations incorporating output signals currently provided to atleast one of said output devices.

[0026] The continuity component is based on contributions from severalcontinuity terms. Each continuity term is multiplied by a continuityterm weighting and then additively combined with the other continuityterms to yield the continuity component of the desirability. Thecontinuity terms can include, for example, a spatial continuity term, acontext continuity term, a rapid switching continuity term, and asustained switching continuity term.

[0027] The spatial continuity term indicates a greater desirability foroutput configurations similar to the output configuration currentlyprovided to the output devices.

[0028] The context continuity term indicates a greater desirability foroutput configurations recently provided to the output devices.

[0029] The rapid switching continuity term indicates a greaterdesirability for the output configuration currently provided to theoutput devices, and a lesser desirability for all other outputconfigurations, the difference in desirability attaining a maximum valuewhen the current output configuration is initially selected anddecreasing thereafter. Finally, the sustained switching continuity termindicates a greater desirability for the output configuration currentlyprovided to the output devices, and a lesser desirability for all otheroutput configurations, the difference in desirability proportional to arecent history switching rate between output configurations.

[0030] The component weightings, activity term weightings, andcontinuity term weightings are adjustable parameters than can be alteredto affect the selection of a most desirable output configuration. Valuesfor the adjustable parameters may be provided to suit a particularconference style, and may be varied over the duration of an individualconference.

[0031] The invention thus allows a large number of input signalsobtained from a wide variety of input devices to be evaluated and routedto a wide variety of output devices using a consistent and logicalframework. Diverse information describing the dynamics of theconferencing environment is incorporated in an intuitive manner toprovide natural and meaningful interaction between physically separatedteleconference participants.

BRIEF DESCRIPTION OF THE DRAWINGS

[0032]FIG. 1 is a flow chart that shows a method of selecting a mostdesirable configuration of output signals from among a plurality ofpossible output configurations according to the invention;

[0033]FIG. 2 is a schematic representation of a teleconference siteaccording to the invention;

[0034]FIG. 3 is a schematic representation of a teleconference systemaccording to the invention;

[0035]FIG. 4 is a flow chart that shows a method of determining a mostdesirable output configuration according to the invention;

[0036]FIG. 5 is a flow chart that shows a method of numericallyevaluating a desirability for each of a plurality of possible outputconfigurations according to the invention.

DESCRIPTION

[0037] The invention operates in a teleconferencing setting,continuously receiving input signals from input devices and monitoringinformation from sensors which describe the teleconference to determineand provide a most desirable configuration of output signals to a set ofoutput devices.

[0038]FIG. 1 is a flow chart that shows a method of selecting a mostdesirable configuration of output signals from among a plurality ofpossible output configurations according to the invention. One or moreinput devices 100 produce input signals 150. Substantially concurrentlywith the production of the input signals, one or more sensors 200provide information 250 describing the teleconference site. Using thedescriptive information, a central processor determines 1000 a mostdesirable output configuration among a plurality of possible outputconfigurations, where each possible output configuration describes aparticular routing of output signals to output devices. The mostdesirable output of configuration is then selected 300, and outputsignals 450 are routed to output devices 400 as specified by theselected output configuration.

[0039] An output signal may be one of the input signals, or a signalcreated by modification, combination, or both modification andcombination of one or more input signals. Primarily, output signalsderived from input signals that originate from input devices located atthe local conferencing site are provided to output devices located atremote conferencing sites. However, in some embodiments, it may bedesirable to provide such output signals to local output devices.

[0040] It should be noted that the steps shown in FIG. 1 occur on asubstantially continuous basis. In particular, the steps are notexecuted in response to detected events or incidents transpiring withinthe conference site, as is the case in the prior art. Rather, the stepsare executed repeatedly and continuously, allowing the system tomaintain a continually updated assessment of the desirability of thepossible output configurations based on the descriptive informationacquired from the sensors.

[0041]FIG. 2 is a schematic representation of a teleconference siteaccording to the invention. As shown in FIG. 2, several localparticipants 10 are seated about a conference table 20, so as to be ableto view a local video display 410. Several video cameras 110 arepositioned throughout the conference facility to capture images of oneor more of the local participants. Collectively, the video camerascapture images of one or more participants from a variety of angles andin a range of shot compositions. For example, the video cameras maycapture a centered, close in view of a single participant 111, a wideview of all three participants 112, a view of two participants over theshoulder of a third 113, and a view of the entire conferencing site 114including an entranceway 30. Referring to FIG. 1, the video cameras maybe regarded as input devices 100 that produce input signals 150 in theform of video signals.

[0042] A plurality of microphones 210 are arrayed so as to capture theaudio throughout the teleconference site. For example, microphones arepositioned to capture the speech emanating from an individual conferenceparticipant 211 or to capture audio of a more ambient nature 214 notassociated with an individual participant. In addition, a motiondetector 220 is mounted so as to detect, for example, the entry or exitof a conference participant through the entranceway 30. Referring againto FIG. 1, the microphones and motion detector may be regarded assensors 200 that acquire descriptive information 250 about theconferencing site. More specifically, the microphones 210 acquire audiosignals that indicate where and when within the conferencing site thereis audio activity, and the motion detector 220 indicates when aconference participant enters or exits the conference site.

[0043]FIG. 3 is a schematic representation of a teleconference systemaccording to the invention. The audio signals 260 obtained by themicrophones 210 are provided to an audio processor 600. The audioprocessor analyzes each audio signal to determine whether or not thereis audio activity in the vicinity of each microphone. To make thisdetermination, the processor may use any of low-pass filtering, risingor falling edge detection, and energy or amplitude thresholding,preferably hysteretic in nature. The audio processor may also performsignal conditioning such as echo canceling. An example a deviceproviding this functionality is the Vortex EF2280, manufactured byPolycom, Inc. of Pleasanton, Calif.

[0044] A true or false value for each of the microphones, reflecting thepresence or absence of audio activity, is provided by the audioprocessor to a central processor 1000. In FIG. 3, this is accomplishedby passing an audio activity vector 270 to the central processor withelements valued either 0 or 1, and length equal to the number ofmicrophones. Alternatively, the audio processor may pass a vector ofscalar values, with the magnitude of each element representing therelative intensity of the audio activity in the vicinity of thecorresponding microphone.

[0045] The audio processor also provides the conditioned audio signals262 to an audio mixer 650. The audio mixer combines the signals into acombined audio signal 265 that is passed through a communicationsnetwork 800 to a remote loudspeaker 470 at the remote conferencing site.The participants at the remote conferencing site thus hear the combinedaudio captured by the microphones 210 at the local conferencing site. Anexample of an audio mixer suitable for use in the invention is thePolycom Vortex EF2280.

[0046] The motion detection signal 280 obtained by the motion detector220 is provided to a threshold detection unit 700. Preferably employinga low-pass filter and hysteretic thresholding, the threshold detectionunit assigns a true or false value to the motion detection signal,reflecting the presence or absence of motion in the vicinity of thedetector, and provides this value to the central processor 1000. In FIG.3, this is accomplished by passing to the central processor a motionactivity vector 290 with a single element valued either 0 or 1. Inembodiments of the invention employing more than one motion detector,the length of the motion activity vector is increased accordingly.

[0047] The video signals 160 acquired by the video cameras 110 areprovided to a matrix switch 500. The matrix switch is also coupled withseveral output devices. In FIG. 3, the matrix switch is coupled with aneffects processor 550, and a remote video display 450, via acommunications network 800. The remote video display is located at theremote conferencing site and is analogous to the local video displayshown in FIG. 2. The effects-processed video signal 170 produced by theeffects processor is also provided as an input to the matrix switch. Thematrix switch selects as output signals one or more of the input signalsit receives and routes them to any one or more of the output devices towhich it is coupled. An example of a matrix switch suitable for use inthe invention is the Matrix 3200 Video Switch, manufactured by ExtronElectronics of Anaheim, Calif.

[0048] Based on the descriptive information of the local conference sitereceived in the form of the audio activity vector 270 and the motionactivity vector 290, the central processor 1000 provides a switchingconfiguration instruction 525 to the matrix switch which specifies aselection and routing of output signals, and which defines an outputconfiguration.

[0049] A great number of output embodiments of the invention arepossible. In one output configuration, a head-on view of a singleparticipant may be routed to the remote video display. Alternatively, awide-angle view of all participants may be routed to the remote videodisplay.

[0050] Other output configurations provide, for example,effects-processed output video signals to the remote video display, suchas video with text overlays, and split-screen shots composed from twoseparate video input signals. The unprocessed input video signal orinput video signals are provided to the effects processor by the matrixswitch. The effects-processed video signal is then returned to thematrix switch and routed to the remote video display.

[0051] To produce the desired effect, it may be necessary for the matrixswitch to provide more than one video input signal to the effectsprocessor, as shown in FIG. 3. Effects processors suitable for use inthe invention, including the Prodigy, manufactured by Videotek, Inc. ofPatstown, Pa., capable of producing split-screen view from two videoinput signals, and the CODI character generator, manufactured by ChyronGraphics of Melville, N.Y., are well known in the art.

[0052] The teleconference setup shown in FIG. 2 and the accompanyingschematic shown in FIG. 3 are simple in nature. Significantly morecomplex and full-featured embodiments are within the scope of theinvention, which may find application in forms of conferencing otherthan traditional teleconferencing.

[0053] For example, the output signals derived from input signalsobtained from input devices located at the local conferencing site aregenerally provided to output devices at the remote conferencing site.However, alternative embodiments of the invention employ a secondarylocal video display, where it may be desirable to display for the localparticipants the video signal routed to the remote video display. Inthis embodiment, a local video input signal is routed to a local outputdevice.

[0054] Further, the audio signals obtained by the microphones serve onlyas descriptive information, and are not treated as input signals to beconsidered as output signals in possible output configurations. Rather,the microphone signals are continually mixed together by the audio mixer650 and provided to the remote location through the communicationsnetwork 800. In other embodiments of the invention, the microphonesignals are provided to the matrix switch in a manner analogous to thevideo signals of FIG. 3, and are treated as input signals.

[0055] Effects processing may also be applied to the audio signals. Inthese embodiments, the central processor 1000 controls which of theseveral audio input signals are selected and provided to output devicesthrough its selection of a most desirable output configuration. In suchembodiments of the invention, the microphones serve as both inputdevices and sensors, and the audio signals serve as both input signalsand descriptive information.

[0056] In other embodiments of the invention, the video signalsadditionally serve as descriptive information. A video processing unitsuch as a gesture recognition unit or a gaze detection unit extractsdescriptive information from the video signals that is passed to thecentral processor.

[0057] Other embodiments incorporate input devices, sensors, and outputdevices not present in the preferred embodiment. Among such inputdevices are, for example, radio tuners, audio tapes, CD's, televisionantennae, VCR's, DVD's, DVR's, CD-ROM's, document cameras, documentscanners, facsimile machines, and personal computers. Speakers,amplifiers, signal processors, tape recorders, digital audio recorders,computer monitors, projectors, facsimile machines, VCR's, and DVR's aresuitable for use as output devices. Using recording devices such asVCR's and DVR's for input and output devices allows for archival andretrieval of teleconferences on a more durable storage medium. A widevariety of descriptive information may be gathered by sensors such asvideo processing units, audio processing units, seat sensors, personnelID readers, security badge readers, range finders, and environmentalsensors such as hygrometers and thermometers. Such sensors may indicate,for example, temperature, humidity, illumination level, the opening of adoor, the presence of an individual in an entryway to the teleconferencesite, the presence of a teleconference participant within theteleconference site, the seating of a participant, the standing up of aparticipant, the speaking activity of a participant, the speaking of apredetermined word by a participant, the posture of a participant, thegaze direction of a participant, the facial expression of a participant,and a gesture by a participant.

[0058] Importantly, the invention allows for any number of input devicesand output devices to be handled, with a corresponding increase in thenumber of possible output configurations. The flexibility in theselection of input and output devices ensures that the invention iscompatible with existing teleconferencing systems. Notably, an outputconfiguration selected and provide by the central processor and matrixswitch can be displayed on a standard remote video display.

[0059] The invention may also be extended to include embodiments wheredescriptive information is gathered for one or more remoteteleconference sites as well as the local teleconference site, and thecentral processor determines an output configuration based on a moreglobal description of all teleconference sites in aggregate. Similarly,the input signals considered as available output signals for use in thepossible output configurations need not originate at the localconferencing site.

[0060] For example, in one embodiment, a central processing algorithmconsiders all input signals from all sites in producing output signals,allowing for a truly diverse range of possible output configurations.For example, split-screen views composed from video signals originatingat separate conferencing sites are possible. In such embodiments, theinput signals and descriptive information may be transmitted to thecentral processor through a communications network on a continual basis,or alternatively, the descriptive information may be transmittedcontinually and the input signals transmitted only when the centralprocessor determines that a specific input signal is needed to create anoutput signal in the selected output configuration.

[0061] The invention may also find application in fields other thanteleconferencing. For example, the invention may aid in the productionof live broadcast television, the editing of movies, the editing oftelevision programs, the creation of a master security video signal fromseveral video cameras, the selection of personalized televisionprogramming with a cable television set-top tuning device, and thecreation of night club video and music programs.

[0062]FIG. 4 is a flow chart that shows a method of determining a mostdesirable output configuration according to the invention. The methodshown in FIG. 4 is executed by the central processor 1000. The methodbegins with the central processor receiving descriptive information 1100from the one or more sensors. The central processor then evaluates adesirability for each of the possible output configurations 1200. Thedesirabilities determined are then used to indicate a most desirableoutput configuration 1300. The indicated most desirable outputconfiguration is then selected 300 as shown in FIG. 1, with a devicesuch as the matrix switch 550 of FIG. 2.

[0063] The method of FIG. 4 is preferably executed on a substantiallycontinuous basis throughout a teleconference. In practice, it isconvenient to begin execution of the method at the beginning of each ina series of regular time intervals, while the length of the intervalprovides the processor sufficient time to execute the method. A mostdesirable output configuration is determined, selected, and implementedonce per interval. Further, the duration of the interval may be madeshort enough that the evaluation and selection of a most desirableoutput configuration appears to conference participants as a continuousprocess.

[0064] In the preferred embodiment of the invention, the desirability ofeach possible output configuration is evaluated numerically, and themost desirable output configuration is the output configuration with thegreatest numerical desirability.

[0065]FIG. 5 is a flow chart that shows a method of numericallyevaluating a desirability for each of a plurality of possible outputconfigurations according to the invention. The method begins with theevaluation of first 1222, second 1224, and third 1226 desirabilitycomponents for the output configuration. Evaluation of the componentsmay be performed substantially concurrently or in series. The first,second, and third desirability components are then multiplied 1250 byfirst 1232, second 1234, and third 1236 weightings, respectively, thatreflect the relative importance of the components in determining theoverall desirability of the output configuration. The results of themultiplication operations are then added 1255 together to obtain thedesirability of the output configuration. Note that while thedesirability evaluated in the preferred embodiment comprises threecomponent desirabilities, other embodiments may evaluate desirabilitieswith any number of components.

[0066] In the preferred embodiment of the invention, the componentsdetailed in FIG. 5 are termed activity, saturation, and continuity. Asshown in FIG. 3, evaluating the desirability of the possible outputconfigurations reduces to evaluating the desirability of displaying eachof the available video output signals on the remote video display.

[0067] Mathematically, the desirabilities of the available outputsignals may be tabulated in a column vector D that is evaluated as

D(t)=K ^(A) A(t)+K ^(S) S(t)K ^(C) C(t)  (1)

[0068] where A(t) is the attention component, C(t) is the continuitycomponent, and S(t) is the saturation component. Each component variesas a function of time, leading to a time dependent desirability,D_(i)(t), for each available video output signal. Associated with eachcomponent is a weighting, K^(A) for the activity component, K^(C) forthe continuity component, and K^(S) for the saturation component. Asnoted, the relative value of these weightings reflect the relativeimportance of the three components in determining the desirability.

[0069] Activity Component

[0070] The activity component reflects the desirability of an availablevideo output signal based on the current activity within theteleconference site indicated by the descriptive information. Theattention component is calculated by additively combining four terms,i.e. audio activity, motion activity, audio undercoverage, and audioovercoverage. Associated with each of these terms is a multiplicativeweighting reflecting the relative importance of each term in determiningthe attention component of the desirability. Mathematically,

A(t)=k _(a) A ^(a)(t)+k _(m) A ^(m)(t)+k _(u) A ^(u)(t)+k _(o) A^(o)(t)  (2)

[0071] Audio Activity Term

[0072] The audio activity term reflects the desirability of an availablevideo output signal based on the detection of audio activity within theteleconference site. This notion is numerically quantified by mappingthe audio activity vector received by the central processor from thesensors onto the available video output signals. The mapping isaccomplished with an audibility matrix, the elements of which reflectthe relevance of a particular microphone to a particular video outputsignal. For example, consider a subset of the available video outputsignals in the preferred embodiment of FIGS. 2 and 3, where the columnvector D of desirabilities contains desirabilities for the video outputsignals

[0073] where A refers to a video output signal displaying a close inview of participant A, B a close in view of participant B, Split A-B asplit-screen view of participants A and B, and Group a wide-angle viewof all participants. An audio activity vector a(t) with successiveelements reflecting the audio activity captured by microphones 211, 212,213, and 214, respectively, is mapped onto the subset of available videooutput signals by evaluating

A ^(a)(t)=Ua(t)  (4)

[0074] where a(t) is the audio activity vector and U is the audibilitymatrix. For the conference site geometry of FIG. 2, the audibilitymatrix U may be given by $\begin{matrix}{U = \lfloor \begin{matrix}1 & 0 & 0 & 0 \\0 & 1 & 0 & 0 \\0.75 & 0.75 & 0 & 0 \\0.6 & 0.6 & 0.6 & 1\end{matrix} \rfloor} & (5)\end{matrix}$

[0075] Thus, if only microphone 211 is active, the close in view ofparticipant A is the most desirable video output signal, with regard toaudio activity. If microphones 211 and 212 are both active, thesplit-screen view of participants A and B is the most desirable. If onlymicrophone 214 is active, or if microphones 211, 212, and 213 areactive, the group view of all participants is the most desirable.

[0076] Motion Activity

[0077] The motion activity term reflects the desirability of eachavailable shot video output signal based on the motion detected in theteleconference site. As with the audio activity term, this is quantifiedby mapping the motion activity vector onto the available video outputsignals. Considering the single motion sensor 220 of FIG. 2 and thesubset of available video output signals given above, the motionactivity desirability may be computed as

A ^(m)(t)=Mm(t)  (6)

[0078] where m(t) is the motion activity vector, here a single element,and M is a motion visibility matrix, reflecting the visibility of thesensible region of the motion detector within the field of view of eachoutput video signal. The motion visibility matrix is given by$\begin{matrix}{M = \lfloor \begin{matrix}0.5 \\0 \\0 \\1\end{matrix} \rfloor} & (7)\end{matrix}$

[0079] Thus, if the motion detector senses motion, the group view is themost desirable output signal, because it effectively shows theentranceway monitored by the motion detector. A close in view ofparticipant A is somewhat less desirable, because the entranceway isvisible in the background of the view, while the other two video outputsignals are not at all desirable because the entranceway is not at allvisible.

[0080] Audio Undercoverage

[0081] While the audio activity and motion activity terms reflect thedesirability of each available video output signal based on the currentactivity in the conference site, the audio undercoverage term reflectsthe desirability of a each available video output signal based on thehistory of previously selected video output signals compared with thehistory of the audio activity. Specifically, the audio undercoverageterm increases the desirability of those video output signals thatdisplay areas within the conference site containing audio activity butare nonetheless not selected.

[0082] Mathematically, the audio undercoverage term A^(u)(t) at a giventime t is based upon the the audio undercoverage at a previous timet−Δt, with

A ^(u)(t)=ε_(u) A ^(u)(t−Δt)+H ₀(t)

Ua(t)Δt  (8)

[0083] where the operator

denotes element by element multiplication of two vectors, and ε_(u) is adecay factor. H₀(t) is a column vector with length equal to the numberof available output video signals, the i^(th) element of which is valued1 if the i^(th) available output video signal is not selected, and 0 ifthe the i^(th) available output video signal is selected. Thus, if aparticular video output signal displays a region of audio activity, yetis continually ignored in the selection process, it accumulates a highaudio undercoverage value. The decay constant ensures that periods ofundercoverage occurring long ago are given less consideration than thoseoccurring more recently.

[0084] Audio Overcoverage

[0085] Complementary to audio undercoverage, audio overcoveragedecreases the desirability of an available video output signal that isselected even though it does not display a region within theteleconference site containing current audio activity. Specifically,

−A ^(o)(t)=ε_(o) A ^(o)(t−Δt)+H ₁(t)

(I−Ua(t))Δt  (9)

[0086] where I is a column vector of ones, and H₁(t) is a column vectorwith length equal to the number of available output video signals, thei^(th) element of which is valued 0 if the i^(th) available output videosignal is not selected, and 1 if the the i^(th) available output videosignal is selected. As with audio undercoverage, ε_(o) is a decayfactor. The negative sign reflects the fact that an increase in audioovercoverage decreases the desirability of an available video outputsignal.

[0087] Note that the notions of undercoverage and overcoverage areeasily extended to any other type of activity, for example motionactivity, to create motion undercoverage and motion overcoverage terms.

[0088] Saturation Component

[0089] The saturation component reflects the boredom that may result ifone particular available video output signal is selecteddisproportionately more often than others. The saturation componentdecreases the desirability of a particular video output signal when thesignal is currently selected, and increases the desirability when thesignal is not selected.

[0090] Mathematically,

−S(t)=ε_(s) S(t−Δt)+H ₁(t)Δt  (10)

[0091] where ε_(s) is a decay factor. The saturation component has asmoothing effect on the selection of video output signals and, apartfrom other components, ensures that all video output signals areselected at least part of the time. The negative sign reflects the factthat an increase in saturation decreases the desirability of aparticular video output signal.

[0092] Continuity Component

[0093] The continuity component reflects the impact the selection of aparticular video output signal would have on the continuity of theprogression of video output signal selections. Qualitatively, theselection of a particular video output signal may appear smooth andseamless, exhibiting a high level of continuity, or abrupt andconfusing, exhibiting a low level of continuity. The continuitycomponent is calculated based on contributions from four continuityterms; i.e. spatial continuity, contextual continuity, rapid switchingcontinuity, and sustained switching continuity. Associated with each ofthese terms is a multiplicative weighting that reflects the relativeimportance of each term in determining the continuity component of thedesirability. Mathematically,

C(t)=k _(s) C ^(s)(t)+k _(c) C ^(c)(t)+k _(r) C ^(r)(t)+k _(b) C^(b)(t)  (11)

[0094] Spatial Continuity

[0095] The spatial continuity term decreases the desirability of thosevideo output signals that, if selected, would demand a great shift inthe mental focus of a remote participant viewing the progression ofselected signals. The decrease in desirability for a particular videooutput signal is thus dependent on the currently selected signal. Thesepenalties may be summarized in a spatial continuity matrix. In thespatial continuity matrix, the penalty associated with a transition fromthe j^(th) video output signal to the i^(th) video output signal isgiven by the matrix element s_(ij). For example, for the subset ofavailable video output signals given earlier, the spatial continuitymatrix may be given by $\begin{matrix}{S = \lfloor \begin{matrix}0 & 1 & 0.5 & 0.75 \\1 & 0 & 0.5 & 0.75 \\0.5 & 0.5 & 0 & 1 \\0.75 & 0.75 & 1 & 0\end{matrix} \rfloor} & (12)\end{matrix}$

[0096] The spatial continuity is then evaluated as

−C ^(s)(t)=SH ₁(t)  (13)

[0097] where the negative sign reflects the fact that a greater shift inspatial continuity decreases the desirability of an available videooutput signal. The spatial continuity matrix indicates that the penaltyassociated with a transition from a head-on view of participant A to ahead-on view of participant B is 1, indicating a rather abrupt shift,while the transition to a split-screen view of participants A and B isassigned a penalty of only 0.5. By definition, all diagonal elementss_(ii) are zero-valued. In many cases, the matrix is also symmetric,with s_(ij)=s_(ji). However, configurations may arise in which thechange in continuity associated with a transition from one video outputsignal to another is not equal to that associated with the reciprocaltransition.

[0098] Contextual Continuity

[0099] The contextual continuity term reflects how distant a particularvideo output signal is in the memory of a remote participant viewing theprogression of selected signals. Qualitatively, the currently selectedvideo output signal is perfectly in context, while a video output signalnot selected recently is out of context because it requires theobserving participant to search his memory to place the signal incontext if it is displayed.

[0100] The contextual continuity of the i^(th) video output signal isgiven by $\begin{matrix}{{C_{i}^{c}(t)} = {\begin{matrix}1 \\{ɛ_{c}{C_{i}^{c}( {t - {\Delta \quad t}} )}}\end{matrix}\quad \begin{matrix}{{if}\quad {the}\quad i^{th}\quad {video}\quad {output}\quad {signal}\quad {is}\quad {currently}\quad {selected}} \\{otherwise}\end{matrix}}} & (14)\end{matrix}$

[0101] where ε_(c) is again a decay factor. Thus, when a video outputsignal is newly selected, it is very fresh in an observer's memory andis assigned a high desirability. The desirability is decreased for thosevideo output signals not currently selected until they are selectedagain.

[0102] Rapid Switching Continuity

[0103] The rapid switching continuity term reflects the discomfortexperienced by conference participants viewing the progression ofselected video output signals when two different video output signalsare selected in rapid succession. When a video output signal is newlyselected, all other video output signals are penalized. While the newlyselected video output signal remains the currently selected video outputsignal, the penalty is decreased over time.

[0104] Specifically, upon selection of a new video output signal$\begin{matrix}{{{C_{i}^{r}(t)} = {\begin{matrix}0 \\{- 1}\end{matrix}\quad \begin{matrix}{{if}\quad {the}\quad i^{th}\quad {video}\quad {output}\quad {signal}\quad {is}\quad {the}\quad {newly}\quad {selected}\quad {signal}} \\{\quad {otherwise}}\end{matrix}}}\quad} & (15)\end{matrix}$

[0105] While the newly selected video output signal remains thecurrently selected video output signal, $\begin{matrix}{{C_{i}^{r}(t)} = {\begin{matrix}1 \\{ɛ_{r}{C_{i}^{r}( {t - {\Delta \quad t}} )}}\end{matrix}\quad \begin{matrix}{{if}\quad {the}\quad i^{th}\quad {video}\quad {output}\quad {signal}\quad {is}\quad {currently}\quad {selected}} \\{otherwise}\end{matrix}}} & (16)\end{matrix}$

[0106] The decay factor ε_(r) is chosen to reflect the time scale overwhich the discomfort associated with a newly selected video outputsignal diminishes.

[0107] Sustained Switching Continuity

[0108] The sustained switching continuity term decreases thedesirability of video output signals that, if selected and provided tothe output devices, would yield to a frenetic progression of selectedsignals. It penalizes the selection of any video output signal otherthan the currently selected video output signal. The magnitude of thepenalty is proportional to a recent switching rate reflecting thefrequency with which newly selected shots have been selected over arecent time period. The sustained switching continuity term imparts asense of inertia, or damping, to the progression of selected videooutput signals. Mathematically, $\begin{matrix}{{C_{i}^{b}(t)} = {\begin{matrix}0 \\{- {\beta ( {t,\tau} )}}\end{matrix}\quad \begin{matrix}{{if}\quad {the}\quad i^{th}\quad {video}\quad {output}\quad {signal}\quad {is}\quad {currently}\quad {selected}} \\{otherwise}\end{matrix}}} & (17)\end{matrix}$

[0109] Here, β(t,τ) is a parameter that reflects the recent historyswitching rate, i.e. the average number of switches per unit time overthe recent time period τ.

[0110] Participant Priorities

[0111] In the preferred embodiment of the invention, the desirability isfurther modified by considering the relative importance of the localconference participants. Each participant is assigned a priority, whichis then mapped onto the available video output signals with aparticipant visibility matrix. Specifically, a priority weighteddesirability, D′(t), may be evaluated as

D′(t)=D(t)

Vp  (18)

[0112] where p is a column vector containing the participant priorities,and V is the participant visibility matrix. Element v_(ij) of matrix Vindicates the visibility of the j^(th) participant in the i^(th) videooutput signal. The effect of the participant priorities is thus easilynullified by assigning all participants and equal priority.

[0113] The behavior of the central processor as it executes thepreceding method of evaluating the desirability of each possible outputconfiguration is controlled in large part by the values of theweightings, matrix elements, decay factors, and participant priorities.In the preferred embodiment of the invention, some or all of thesevalues are user adjustable parameters that may be varied by the one ormore of the conference participants. The participants may then adjustthe behavior of the selection process to suit the needs of a particularconference. The values are preferably presented on a touch screen flatpanel interface allowing for intuitive access to and control of the useradjustable parameters. Alternatively, various present configurations maybe provided from which users may select a most appropriate selectionprocess behavior.

[0114] A useful simplification is achieved by restricting the useradjustable parameters to the component weightings, term weightings, andparticipant priorities. These values are varied to suit the changingdynamics of a particular conference. The matrix elements, and decayconstants, however, are more reflective of a particular teleconferencesite geometry, and may thus remain fixed throughout an individualconference.

[0115] The central processor also preferably supports the loading ofprogram modules that provide a set of values designed to match aspecific conference style. For example, specialized modules may becreated for board meetings, staff meetings, and design team meetings.Further, the values need not remain constant through a conference, butinstead may change to reflect the differing dynamics of the beginning,middle, and end of a meeting.

[0116] As noted, the invention may incorporate any of a number of inputand output devices to achieve a wide range of functionality. In analternative embodiment of the invention, facsimile machines at the localand remote conference sites serve as input devices, sensors, and outputdevices. The scanned image of a document inserted into the localfacsimile machine is treated as an input signal. A sensor on thefacsimile machine indicates to the central processor that there isfacsimile activity at the local conference site. Local facsimileactivity in turn induces a large increase in the desirability associatedwith routing the facsimile input signal as an output signal to theremote facsimile machine.

[0117] In another embodiment, a security badge reader is included as asensor. The central processor computes a current security level for theconference, defined by the lowest security level among the conferenceparticipants currently at the conference site. If a new participant witha security level below the current security level enters the conferencesite, the security level of the conference is lowered. The lowering ofthe conference security level induces a dramatic, essentially infinitelowering of the desirability of routing any output signals containingsensitive material to output devices viewable by the participant with alower security level.

[0118] Although the invention is described herein with reference toseveral embodiments, including the preferred embodiment, one skilled inthe art will readily appreciate that other applications may besubstituted for those set forth herein without departing from the spiritand scope of the invention.

[0119] Accordingly, the invention should only be limited by thefollowing claims.

1. A method for conferencing, comprising the steps of: receiving atleast one input signal from at least one input device; receivingdescriptive information during a conference from at least one sensor;determining, in a substantially continuous manner, based on saiddescriptive information, a desirability for each of a plurality ofpossible output configurations, each of said possible outputconfigurations specifying a routing of at least one output signal to atleast one output device; selecting a most desirable output configurationamong said plurality of possible output configurations; and providingsaid at least one output signal to said at least one output device inaccordance with said most desirable output configuration.
 2. The methodof claim 1, wherein at least one input signal is received via atelecommunications network.
 3. The method of claim 1, further comprisinga plurality of input devices located at a plurality of physicallydisparate locations.
 4. The method of claim 1, wherein at least oneoutput signal is provided via a communications network.
 5. The method ofclaim 1, further comprising a plurality of output devices located at aplurality of physically disparate locations.
 6. The method of claim 1,wherein at least one input device is an audio device.
 7. The method ofclaim 6, wherein said audio device is any of: an analog audio source;and a digital audio source.
 8. The method of claim 1, wherein at leastone input device is a video device.
 9. The method of claim 8, whereinsaid video device is any of: a video camera; a television signal source;a video effects processor; an analog video source; and a digital videosource.
 10. The method of claim 1, wherein at least one input device isa multimedia device.
 11. The method of claim 10, wherein said multimediadevice is any of: a document camera; a document scanner; a facsimilemachine; and a personal computer.
 12. The method of claim 1, wherein atleast one output device is an audio device.
 13. The method of claim 12,wherein said audio device is any of: a speaker; an amplifier; a signalprocessor; an analog audio recording device; and a digital audiorecording device.
 14. The method of claim 1, wherein at least one outputdevice is a video device.
 15. The method of claim 14, wherein said videodevice is any of: a television; a monitor; a computer display; a videoeffects processor; an analog video recording device; and a digital videorecording device.
 16. The method of claim 1, wherein at least one outputdevice is a multimedia device.
 17. The method of claim 16, wherein saidmultimedia device is any of: a printer; an overhead projector; adocument projection system; a facsimile machine, and a personalcomputer.
 18. The method of claim 1, wherein at least one input devicereproduces from a storage medium a previously acquired input signal. 19.The method of claim 1, wherein at least one output device stores saidoutput signal on a storage medium.
 20. The method of claim 1, whereinsaid step of providing said at least one output signal comprises thestep of composing at least one output signal from at least one inputsignal.
 21. The method of claim 20, wherein said step of composingcomprises the step of creating a video signal that represents asplit-screen view of at least two video signal inputs.
 22. The method ofclaim 20, wherein said step of composing comprises the step ofassociating at least one audio input signal with a video input signal.23. The method of claim 20, wherein said step of composing comprises thestep of selecting at least one input signal.
 24. The method of claim 1,wherein said descriptive information received from said at least onesensor comprises information indicating any of: temperature; humidity;illumination level; an opening of a door; a presence of a conferenceparticipant in an entryway; a presence of a conference participantwithin a conference site; a change in a security level of saidconference, said security level defined by a lowest security level amongall conference participants; a seating of a conference participant; astanding up of a conference participant; a completed scanning of adocument by a facsimile machine; an activity of a microphone input; aspeaking activity of a conference participant; a speaking of apredetermined word by a conference participant; a posture of aconference participant; a gaze direction of a conference participant; afacial expression of a conference participant; and a gesture by aconference participant.
 25. The method of claim 1, wherein saiddescriptive information received from said at least one sensor comprisesinformation indicating any of: which of said at least one output signalare currently provided to said at least one output device; which of saidat least one output signal were recently provided to said at least oneoutput device; which of said possible output configurations is currentlyselected; and which of said possible output configurations were recentlyselected.
 26. The method of claim 1, wherein at least one sensor alsofunctions as one of said at least one input device.
 27. The method ofclaim 1, wherein at least one input device also functions as one of saidat least one sensor.
 28. The method of claim 1, wherein saiddesirability is evaluated numerically.
 29. The method of claim 28,wherein said most desirable output configuration has a greatestnumerical desirability among said possible output configurations. 30.The method of claim 28, wherein said desirability comprises: a pluralityof components, wherein each of said components is multiplied by acomponent weighting and then additively combined with all other of saidcomponents to yield said numerical desirability, and wherein saidcomponent weighting indicates a relative importance of said each of saidcomponents in determining said desirability.
 31. The method of claim 30,wherein said component weighting is an adjustable parameter.
 32. Themethod of claim 31, wherein said adjustable parameter is assigned avalue that produces a selection of output configurations that isconsistent with a desired conference style.
 33. The method of claim 28,wherein said desirability comprises an activity component comprising atleast one activity term indicating a relevance of said descriptiveinformation to each of said possible output configurations.
 34. Themethod of claim 33, wherein each of said at least one activity term ismultiplied by an activity term weighting and then additively combinedwith all other of said at least one activity term to yield said activitycomponent, and wherein said activity term weighting indicates a relativeimportance of said each of said at least one activity term indetermining said activity component.
 35. The method of claim 34, whereinsaid activity term weighting is an adjustable parameter.
 36. The methodof claim 35, wherein said adjustable parameter is assigned a value thatproduces a selection of output configurations that is consistent with adesired conference style.
 37. The method of claim 33, wherein each ofsaid at least one activity term comprises: a mapping of said descriptiveinformation onto said possible output configurations.
 38. The method ofclaim 37, wherein said descriptive information comprises: an activityvector; wherein said mapping comprises a matrix that contains elementswhich are arranged in at least one row and at least one column, whereineach of said at least one row corresponds to one of said possible outputconfigurations and each of said at least one column corresponds to anelement of said activity vector; and wherein said mapping is performedby multiplying said matrix by said activity vector.
 39. The method ofclaim 38, wherein said elements of said matrix are adjustableparameters.
 40. The method of claim 37, wherein said at least one inputdevice comprises at least one video camera having a visible range;wherein said at least one sensor comprises at least one microphonehaving an audible range; wherein said at least one input signalcomprises at least one video signal that is received from said at leastone video camera; wherein said descriptive information comprises atleast one microphone activity level that identifies an activity level ofsaid at least one microphone; wherein said possible outputconfigurations provide at least one video signal to at least one outputdevice; and wherein said mapping comprises an audibility mapping thatindicates an extent to which said audible range of each of said at leastone microphone corresponds to said visible range of each of said atleast one video camera.
 41. The method of claim 40, wherein said atleast one activity term comprises: an audio activity term that indicatesa greater desirability for output configurations that incorporate outputsignals composed at least in part from video signals that correspond, asindicated by said audibility mapping, to active microphones.
 42. Themethod of claim 40, wherein said descriptive information furtherindicates which of said at least one output signal are currentlyprovided to which of said at least one output device; wherein said atleast one activity term comprises an audio undercoverage term thatindicates an increasing desirability for output configurations thatincorporate output signals that are composed at least in part from avideo signal that corresponds, as indicated by said audibility mapping,to active microphones, and that are not currently provided to at leastone output device.
 43. The method of claim 40, wherein said descriptiveinformation further indicates which of said at least one output signalare currently provided to which of said at least one output device;wherein said at least one activity term comprises an audio overcoverageterm which indicates a decreasing desirability for output configurationsthat incorporate output signals that are composed at least in part froma video signal that corresponds, as indicated by said audibilitymapping, to inactive microphones, and that are currently provided to atleast one output device.
 44. The method of claim 28, wherein saiddescriptive information further indicates which of said at least oneoutput signal are currently provided to which of said at least oneoutput device; and wherein said desirability comprises a saturationcomponent that indicates an increasing desirability for outputconfigurations that incorporate output signals that are not currentlyprovided to at least one output device, and that indicate a decreasingdesirability for output configurations that incorporate output signalscurrently provided to at least one output device.
 45. The method ofclaim 44, wherein said at least one input device comprises at least onevideo camera; wherein said at least one input signal comprises at leastone video signal received from said at least one video camera; whereinsaid possible output configurations comprise providing at least onevideo signal to at least one output device; wherein said saturationcomponent increases for output configurations that incorporate at leastone video signal currently provided to at least one output device; andsaid saturation component decreases for output configurations that donot incorporate at least one video signal currently provided to at leastone output device.
 46. The method of claim 28, wherein said desirabilitycomprises: a continuity component comprising at least one continuityterm.
 47. The method of claim 46, wherein each of said at least onecontinuity term is multiplied by a continuity term weighting and then isadditively combined with all other of said at least one continuity termto yield said continuity component; and wherein said continuity termweighting indicates a relative importance of said each of said at leastone continuity term in determining said continuity component.
 48. Themethod of claim 47, wherein said continuity term weighting is anadjustable parameter.
 49. The method of claim 48, wherein saidadjustable parameter is assigned a value for producing a selection ofoutput configurations that is consistent with a desired conferencestyle.
 50. The method of claim 46, wherein said descriptive informationcomprises: a history of recently provided output configurations; andwherein said at least one continuity term comprises a context continuityterm that indicates a greater desirability for said recently providedoutput configurations.
 51. The method of claim 46, said descriptiveinformation comprises: an indication of a current output configurationthat is currently provided to said output devices.
 52. The method ofclaim 51, wherein said at least one continuity term comprises: a rapidswitching continuity term that indicates a greater desirability for saidcurrent output configuration and a lesser desirability for all other ofsaid possible output configurations; and wherein said greaterdesirability exceeds said lesser desirability by an amount that attainsa maximum value immediately upon selection of said current outputconfiguration and that decays thereafter.
 53. The method of claim 51,wherein said descriptive information further comprises: a recent historyswitching rate that reflects a time averaged rate at which outputconfigurations that differ from said current output configuration areselected; wherein said at least one continuity term comprises asustained switching continuity term that indicates a greaterdesirability for said current output configuration and a lesserdesirability for all other of said possible output configurations; andwherein said greater desirability exceeds said lesser desirability by anamount that is proportional to said recent history switching rate. 54.The method of claim 51, wherein said at least one continuity termcomprises: a spatial continuity term that indicates a greaterdesirability for output configurations that are similar to said currentoutput configuration.
 55. The method of claim 54, wherein said spatialcontinuity term comprises: a spatial continuity matrix that compriseselements arranged in at least one row and at least one column, whereeach of said at least one row corresponds to one of said possible outputconfigurations, where each of said at least one column corresponds toone of said possible output configurations, and where an element locatedin an m^(th) row and an n^(th) column indicates a perceptual shift thatis required of an observer of said at least one output device if ann^(th) output configuration is said current output configuration, and anm^(th) output configuration is selected and provided to said at leastone output device.
 56. The method of claim 1, wherein said desirabilitycomprises: at least one participant priority that indicates a relativeimportance of each of at least one conference participant.
 57. Themethod of claim 1, wherein said at least one input device comprises: atleast one document scanning device; wherein said at least one inputsignal comprises a scanned document representation produced by said atleast one document scanning device; wherein said at least one outputdevice comprises at least one document production device that is capableof producing a document from said scanned document representation;wherein said at least one sensor comprises at least one detector thatindicates a completed acquisition of said scanned documentrepresentation; wherein said determining step comprises the step of,upon said at least one detector indicating said completed acquisition,determining a most desirable destination document production deviceamong said at least one document reproduction device; and wherein saidproviding step comprises the step of providing said scanned documentrepresentation to said destination document production device.
 58. Themethod of claim 57, wherein said at least one document scanning devicecomprises at least one digital document scanner, said scanned documentrepresentation is a digitally scanned document, and said at least onedocument production device comprises at least one printer.
 59. Themethod of claim 1, wherein said at least one sensor comprises: a devicefor detecting a participant security level of a conference participant;wherein said at least one output signal each have an output signalsecurity level; wherein said descriptive information comprises aconference security level equal to a lowest security level detected bysaid device for detecting a participant security level, and wherein saidmost desirable output configuration specifies output signals that eachhave an output signal security level less than said conference securitylevel.
 60. The method of claim 59, wherein said device for detecting aparticipant security level is a security badge reader.
 61. A method forrouting at least one output signal to at least one output devicecomprising the steps of: continuously receiving at least one inputsignal obtained from at least one input device; continuously receivingdescriptive information from at least one sensor; determining, based onsaid descriptive information, a desirability for each of a plurality ofpossible output configurations, each possible output configurationspecifying said routing; and providing said output signals to saidoutput device in accordance with a most desirable output configurationamong said possible output configurations.
 62. The method of claim 61,wherein said method is used to produce any of: a live broadcasttelevision production; an edited version of a movie filmed with aplurality of movie cameras and recorded with a plurality of microphones;an edited version of a television program originally recorded with aplurality of television cameras and recorded with a plurality ofmicrophones; a master security video signal from a plurality of securitycamera signals; a personalized selection of television programming froma cable television tuning device; and a video and musical program. 63.An apparatus for conferencing, comprising: at least one input forreceiving at least one input signal from at least one input device; atleast one sensor for receiving descriptive information during aconference; a processor for determining, in a substantially continuousmanner, based on said descriptive information, a desirability for eachof a plurality of possible output configurations, each of said possibleoutput configurations specifying a routing of at least one output signalto at least one output device; said processor selecting a most desirableoutput configuration among said plurality of possible outputconfigurations; and at least one output for providing said at least oneoutput signal to said at least one output device in accordance with saidmost desirable output configuration.
 64. The apparatus of claim 63,wherein said at least one output signal comprises is composed from atleast one selected input signal.
 65. The apparatus of claim 63, whereinsaid descriptive information received from said at least one sensorcomprises information indicating any of: temperature; humidity;illumination level; an opening of a door; a presence of a conferenceparticipant in an entryway; a presence of a conference participantwithin a conference site; a change in a security level of saidconference, said security level defined by a lowest security level amongall conference participants; a seating of a conference participant; astanding up of a conference participant; a completed scanning of adocument by a facsimile machine; an activity of a microphone input; aspeaking activity of a conference participant; a speaking of apredetermined word by a conference participant; a posture of aconference participant; a gaze direction of a conference participant; afacial expression of a conference participant; and a gesture by aconference participant.
 66. The apparatus of claim 63, wherein saiddescriptive information received from said at least one sensor comprisesinformation indicating any of: which of said at least one output signalare currently provided to said at least one output device; which of saidat least one output signal were recently provided to said at least oneoutput device; which of said possible output configurations is currentlyselected; and which of said possible output configurations were recentlyselected.
 67. The apparatus of claim 63, wherein said desirability isevaluated numerically.
 68. The apparatus of claim 67, wherein said mostdesirable output configuration has a greatest numerical desirabilityamong said possible output configurations.
 69. The apparatus of claim67, wherein said desirability comprises: a plurality of components,wherein each of said components is multiplied by a component weightingand then additively combined with all other of said components to yieldsaid numerical desirability, and wherein said component weightingindicates a relative importance of said each of said components indetermining said desirability.
 70. The apparatus of claim 69, whereinsaid desirability comprises an activity component comprising at leastone activity term indicating a relevance of said descriptive informationto each of said possible output configurations.
 71. The apparatus ofclaim 70, wherein each of said at least one activity term is multipliedby an activity term weighting and then additively combined with allother of said at least one activity term to yield said activitycomponent, and wherein said activity term weighting indicates a relativeimportance of said each of said at least one activity term indetermining said activity component.
 72. The apparatus of claim 70,wherein each of said at least one activity term comprises: a mapping ofsaid descriptive information onto said possible output configurations.73. The apparatus of claim 72, wherein said descriptive informationcomprises: an activity vector; wherein said mapping comprises a matrixthat contains elements which are arranged in at least one row and atleast one column, wherein each of said at least one row corresponds toone of said possible output configurations and each of said at least onecolumn corresponds to an element of said activity vector; and whereinsaid mapping is performed by multiplying said matrix by said activityvector.
 74. The apparatus of claim 63, wherein said at least one inputdevice comprises at least one video camera having a visible range;wherein said at least one sensor comprises at least one microphonehaving an audible range; wherein said at least one input signalcomprises at least one video signal that is received from said at leastone video camera; wherein said descriptive information comprises atleast one microphone activity level that identifies an activity level ofsaid at least one microphone; wherein said possible outputconfigurations provide at least one video signal to at least one outputdevice; and further comprising” an audibility mapping that indicates anextent to which said audible range of each of said at least onemicrophone corresponds to said visible range of each of said at leastone video camera.
 75. The apparatus of claim 74, further comprising: anaudio activity term that indicates a greater desirability for outputconfigurations that incorporate output signals composed at least in partfrom video signals that correspond, as indicated by said audibilitymapping, to active microphones.
 76. The apparatus of claim 75, whereinsaid descriptive information further indicates which of said at leastone output signal are currently provided to which of said at least oneoutput device; wherein said at least one activity term comprises anaudio undercoverage term that indicates an increasing desirability foroutput configurations that incorporate output signals that are composedat least in part from a video signal that corresponds, as indicated bysaid audibility mapping, to active microphones, and that are notcurrently provided to at least one output device.
 77. The apparatus ofclaim 75, wherein said descriptive information further indicates whichof said at least one output signal are currently provided to which ofsaid at least one output device; wherein said at least one activity termcomprises an audio overcoverage term which indicates a decreasingdesirability for output configurations that incorporate output signalsthat are composed at least in part from a video signal that corresponds,as indicated by said audibility mapping, to inactive microphones, andthat are currently provided to at least one output device.
 78. Theapparatus of claim 75, wherein said descriptive information furtherindicates which of said at least one output signal are currentlyprovided to which of said at least one output device; and wherein saiddesirability comprises a saturation component that indicates anincreasing desirability for output configurations that incorporateoutput signals that are not currently provided to at least one outputdevice, and that indicate a decreasing desirability for outputconfigurations that incorporate output signals currently provided to atleast one output device.
 79. The apparatus of claim 63, wherein said atleast one input device comprises at least one video camera; wherein saidat least one input signal comprises at least one video signal receivedfrom said at least one video camera; wherein said possible outputconfigurations comprise providing at least one video signal to at leastone output device; wherein said saturation component increases foroutput configurations that incorporate at least one video signalcurrently provided to at least one output device; and said saturationcomponent decreases for output configurations that do not incorporate atleast one video signal currently provided to at least one output device.80. The apparatus of claim 63, further comprising: at least onecontinuity term that is multiplied by a continuity term weighting andthen is additively combined with any other of said at least onecontinuity term to yield a continuity component; and wherein saidcontinuity term weighting indicates a relative importance of said eachof said at least one continuity term in determining said continuitycomponent.
 81. The apparatus of claim 80, wherein said descriptiveinformation comprises: a history of recently provided outputconfigurations; and wherein said at least one continuity term comprisesa context continuity term that indicates a greater desirability for saidrecently provided output configurations.
 82. The apparatus of claim 80,wherein said at least one continuity term comprises: a rapid switchingcontinuity term that indicates a greater desirability for said currentoutput configuration and a lesser desirability for all other of saidpossible output configurations; and wherein said greater desirabilityexceeds said lesser desirability by an amount that attains a maximumvalue immediately upon selection of said current output configurationand that decays thereafter.
 83. The apparatus of claim 82, wherein saiddescriptive information further comprises: a recent history switchingrate that reflects a time averaged rate at which output configurationsthat differ from said current output configuration are selected; whereinsaid at least one continuity term comprises a sustained switchingcontinuity term that indicates a greater desirability for said currentoutput configuration and a lesser desirability for all other of saidpossible output configurations; and wherein said greater desirabilityexceeds said lesser desirability by an amount that is proportional tosaid recent history switching rate.
 84. The apparatus of claim 83,wherein said spatial continuity term comprises: a spatial continuitymatrix that comprises elements arranged in at least one row and at leastone column, where each of said at least one row corresponds to one ofsaid possible output configurations, where each of said at least onecolumn corresponds to one of said possible output configurations, andwhere an element located in an m^(th) row and an n^(th) column indicatesa perceptual shift that is required of an observer of said at least oneoutput device if an n^(th) output configuration is said current outputconfiguration, and an m^(th) output configuration is selected andprovided to said at least one output device.
 85. The apparatus of claim63, wherein said at least one input device comprises: at least onedocument scanning device; wherein said at least one input signalcomprises a scanned document representation produced by said at leastone document scanning device; wherein said at least one output devicecomprises at least one document production device that is capable ofproducing a document from said scanned document representation; whereinsaid at least one sensor comprises at least one detector that indicatesa completed acquisition of said scanned document representation; whereinsaid processor, upon said at least one detector indicating saidcompleted acquisition, determines a most desirable destination documentproduction device among said at least one document reproduction device;and wherein said processor provides said scanned document representationto said destination document production device.
 86. The apparatus ofclaim 63, wherein said at least one sensor comprises: a device fordetecting a participant security level of a conference participant;wherein said at least one output signal each have an output signalsecurity level; wherein said descriptive information comprises aconference security level equal to a lowest security level detected bysaid device for detecting a participant security level, and wherein saidmost desirable output configuration specifies output signals that eachhave an output signal security level less than said conference securitylevel.
 87. An apparatus for routing at least one output signal to atleast one output device comprising: at least on input for continuouslyreceiving at least one input signal obtained from at least one inputdevice; at least one sensor for continuously receiving descriptiveinformation; a processor for determining, based on said descriptiveinformation, a desirability for each of a plurality of possible outputconfigurations, each possible output configuration specifying saidrouting; and at least one output for providing said output signals tosaid output device in accordance with a most desirable outputconfiguration among said possible output configurations.
 88. Theapparatus of claim 87, wherein said apparatus produces any of: a livebroadcast television production; an edited version of a movie filmedwith a plurality of movie cameras and recorded with a plurality ofmicrophones; an edited version of a television program originallyrecorded with a plurality of television cameras and recorded with aplurality of microphones; a master security video signal from aplurality of security camera signals; a personalized selection oftelevision programming from a cable television tuning device; and avideo and musical program.